BBQ: A Hand-Built Bias Benchmark for Question Answering

Parrish, Alicia; Chen, Angelica; Nangia, Nikita; Padmakumar, Vishakh; Phang, Jason; Thompson, Jana; Htut, Phu Mon; Bowman, Samuel R.

Computer Science > Computation and Language

arXiv:2110.08193v2 (cs)

[Submitted on 15 Oct 2021 (v1), last revised 16 Mar 2022 (this version, v2)]

Title:BBQ: A Hand-Built Bias Benchmark for Question Answering

Authors:Alicia Parrish, Angelica Chen, Nikita Nangia, Vishakh Padmakumar, Jason Phang, Jana Thompson, Phu Mon Htut, Samuel R. Bowman

View PDF

Abstract:It is well documented that NLP models learn social biases, but little work has been done on how these biases manifest in model outputs for applied tasks like question answering (QA). We introduce the Bias Benchmark for QA (BBQ), a dataset of question sets constructed by the authors that highlight attested social biases against people belonging to protected classes along nine social dimensions relevant for U.S. English-speaking contexts. Our task evaluates model responses at two levels: (i) given an under-informative context, we test how strongly responses reflect social biases, and (ii) given an adequately informative context, we test whether the model's biases override a correct answer choice. We find that models often rely on stereotypes when the context is under-informative, meaning the model's outputs consistently reproduce harmful biases in this setting. Though models are more accurate when the context provides an informative answer, they still rely on stereotypes and average up to 3.4 percentage points higher accuracy when the correct answer aligns with a social bias than when it conflicts, with this difference widening to over 5 points on examples targeting gender for most models tested.

Comments:	Accepted to ACL 2022 Findings. 20 pages, 10 figures
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2110.08193 [cs.CL]
	(or arXiv:2110.08193v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2110.08193

Submission history

From: Alicia Parrish [view email]
[v1] Fri, 15 Oct 2021 16:43:46 UTC (1,587 KB)
[v2] Wed, 16 Mar 2022 01:35:45 UTC (1,231 KB)

Computer Science > Computation and Language

Title:BBQ: A Hand-Built Bias Benchmark for Question Answering

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:BBQ: A Hand-Built Bias Benchmark for Question Answering

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators